Goto

Collaborating Authors

 impartial game


Mastering NIM and Impartial Games with Weak Neural Networks: An AlphaZero-inspired Multi-Frame Approach

Riis, Søren

arXiv.org Artificial Intelligence

This paper provides a theoretical framework that validates and explains the results in the work with Bei Zhou experimentally finding that AlphaZero-style reinforcement learning algorithms struggle to learn optimal play in NIM, a canonical impartial game proposed as an AI challenge by Harvey Friedman in 2017. Our analysis resolves a controversy around these experimental results, which revealed unexpected difficulties in learning NIM despite its mathematical simplicity compared to games like chess and Go. Our key contributions are as follows: We prove that by incorporating recent game history, these limited AlphaZero models can, in principle, achieve optimal play in NIM. We introduce a novel search strategy where roll-outs preserve game-theoretic values during move selection, guided by a specialised policy network. We provide constructive proofs showing that our approach enables optimal play within the \(\text{AC}^0\) complexity class despite the theoretical limitations of these networks. This research demonstrates how constrained neural networks when properly designed, can achieve sophisticated decision-making even in domains where their basic computational capabilities appear insufficient.


Exploring Parity Challenges in Reinforcement Learning through Curriculum Learning with Noisy Labels

Zhou, Bei, Riis, Soren

arXiv.org Artificial Intelligence

This paper delves into applying reinforcement learning (RL) in strategy games, particularly those characterized by parity challenges, as seen in specific positions of Go and Chess and a broader range of impartial games. We propose a simulated learning process, structured within a curriculum learning framework and augmented with noisy labels, to mirror the intricacies of self-play learning scenarios. This approach thoroughly analyses how neural networks (NNs) adapt and evolve from elementary to increasingly complex game positions. Our empirical research indicates that even minimal label noise can significantly impede NNs' ability to discern effective strategies, a difficulty that intensifies with the growing complexity of the game positions. These findings underscore the urgent need for advanced methodologies in RL training, specifically tailored to counter the obstacles imposed by noisy evaluations. The development of such methodologies is crucial not only for enhancing NN proficiency in strategy games with significant parity elements but also for broadening the resilience and efficiency of RL systems across diverse and complex environments.


Impartial Games: A Challenge for Reinforcement Learning

Zhou, Bei, Riis, Søren

arXiv.org Artificial Intelligence

While AlphaZero-style reinforcement learning (RL) algorithms excel in various board games, in this paper we show that they face challenges on impartial games where players share pieces. We present a concrete example of a game - namely the children's game of Nim - and other impartial games that seem to be a stumbling block for AlphaZero-style and similar self-play reinforcement learning algorithms. Our work is built on the challenges posed by the intricacies of data distribution on the ability of neural networks to learn parity functions, exacerbated by the noisy labels issue. Our findings are consistent with recent studies showing that AlphaZero-style algorithms are vulnerable to adversarial attacks and adversarial perturbations, showing the difficulty of learning to master the games in all legal states. We show that Nim can be learned on small boards, but the learning progress of AlphaZero-style algorithms dramatically slows down when the board size increases. Intuitively, the difference between impartial games like Nim and partisan games like Chess and Go can be explained by the fact that if a small part of the board is covered for impartial games it is typically not possible to predict whether the position is won or lost as there is often zero correlation between the visible part of a partly blanked-out position and its correct evaluation. This situation starkly contrasts partisan games where a partly blanked-out board position typically provides abundant or at least non-trifle information about the value of the fully uncovered position.


Nimber-Preserving Reductions and Homomorphic Sprague-Grundy Game Encodings

Burke, Kyle, Ferland, Matthew, Teng, Shanghua

arXiv.org Artificial Intelligence

The concept of nimbers--a.k.a. Grundy-values or nim-values--is fundamental to combinatorial game theory. Nimbers provide a complete characterization of strategic interactions among impartial games in their disjunctive sums as well as the winnability. In this paper, we initiate a study of nimber-preserving reductions among impartial games. These reductions enhance the winnability-preserving reductions in traditional computational characterizations of combinatorial games. We prove that Generalized Geography is complete for the natural class, $\cal{I}^P$ , of polynomially-short impartial rulesets under nimber-preserving reductions, a property we refer to as Sprague-Grundy-complete. In contrast, we also show that not every PSPACE-complete ruleset in $\cal{I}^P$ is Sprague-Grundy-complete for $\cal{I}^P$ . By considering every impartial game as an encoding of its nimber, our technical result establishes the following striking cryptography-inspired homomorphic theorem: Despite the PSPACE-completeness of nimber computation for $\cal{I}^P$ , there exists a polynomial-time algorithm to construct, for any pair of games $G_1$, $G_2$ of $\cal{I}^P$ , a prime game (i.e. a game that cannot be written as a sum) $H$ of $\cal{I}^P$ , satisfying: nimber($H$) = nimber($G_1$) $\oplus$ nimber($G_2$).


Winning the War by (Strategically) Losing Battles: Settling the Complexity of Grundy-Values in Undirected Geography

Burke, Kyle, Ferland, Matthew, Teng, Shanghua

arXiv.org Artificial Intelligence

We settle two long-standing complexity-theoretical questions-open since 1981 and 1993-in combinatorial game theory (CGT). We prove that the Grundy value (a.k.a. nim-value, or nimber) of Undirected Geography is PSPACE-complete to compute. This exhibits a stark contrast with a result from 1993 that Undirected Geography is polynomial-time solvable. By distilling to a simple reduction, our proof further establishes a dichotomy theorem, providing a "phase transition to intractability" in Grundy-value computation, sharply characterized by a maximum degree of four: The Grundy value of Undirected Geography over any degree-three graph is polynomial-time computable, but over degree-four graphs-even when planar and bipartite-is PSPACE-hard. Additionally, we show, for the first time, how to construct Undirected Geography instances with Grundy value $\ast n$ and size polynomial in n. We strengthen a result from 1981 showing that sums of tractable partisan games are PSPACE-complete in two fundamental ways. First, since Undirected Geography is an impartial ruleset, we extend the hardness of sums to impartial games, a strict subset of partisan. Second, the 1981 construction is not built from a natural ruleset, instead using a long sum of tailored short-depth game positions. We use the sum of two Undirected Geography positions to create our hard instances. Our result also has computational implications to Sprague-Grundy Theory (1930s) which shows that the Grundy value of the disjunctive sum of any two impartial games can be computed-in polynomial time-from their Grundy values. In contrast, we prove that assuming PSPACE $\neq$ P, there is no general polynomial-time method to summarize two polynomial-time solvable impartial games to efficiently solve their disjunctive sum.